Decision Trees for Lexical Smoothing in Statistical Machine Translation

نویسندگان

  • Rabih Zbib
  • Spyridon Matsoukas
  • Richard M. Schwartz
  • John Makhoul
چکیده

We present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-quali ed source words, and smoothing their word translation probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute-quali ed source word clusters directly, or by using attributedependent lexical translation probabilities that are obtained from the trees, as a lexical smoothing feature in the decoder model. We present experiments using Arabic-to-English newswire data, and using Arabic diacritics and part-ofspeech as source word attributes, and show that the proposed method improves on a state-of-the-art translation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Lexical Features for Statistical Machine Translation

Title of dissertation: LEXICAL FEATURES FOR STATISTICAL MACHINE TRANSLATION Jacob Devlin, Master of Science, 2009 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science In modern phrasal and hierarchical statistical machine translation systems, two major features model translation: rule translation probabilities and lexical smoothing scores. The rule translation probabil...

متن کامل

Suffix Trees as Language Models

Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how some properties of suffix trees naturally provide the functionality of an n-gram language model with variable n. We explain how we leverage these properties of suffix trees for our Suffix Tree Language Model (STLM) implementation and explain how a suffix tree implicitly contains the data needed fo...

متن کامل

The RWTH Aachen Machine Translation System for WMT 2012

This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the translation task of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT 2012). We participated in the evaluation campaign for the French-English and German-English language pairs in both translation directions. Both hierarchical and phrase-based SMT systems are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010